Automated Japanese grapheme-phoneme alignment
نویسندگان
چکیده
This paper describes an adapatation of the tf-idf model to Japanese graphemephoneme alignment, without reliance on training data. The tf-idf model is optionally complemented with affixation and conjugation handling modules, and determines frequencies through analysis of “alignment potential”. The proposed system achieved a maximum accuracy of 94.74% on evaluation.
منابع مشابه
Efficient Grapheme-phoneme Alignment for Japanese
Current approaches to the grapheme-phoneme alignment problem for Japanese achieve good accuracy, but are extremely computationally expensive. In this paper we evaluate various modifications to previous algorithms for both the alignment and okurigana detection subtasks. The best algorithm achieved accuracy of 96.2% for the combined task on a limited data set, and was significantly more efficient...
متن کاملThe Applications Of Unsupervised Learning To Japanese Grapheme-Phoneme Alignment
In this paper, we adapt the TF-IDF model to the Japanese grapheme-phoneme alignment task, by way of a simple statistical model and an incremental learning method. In the incremental learning method, grapheme-phoneme alignment paradigms are disambiguated one at a t ime according to the relative plausibility of the highest scoring alignment schema, and the statistical model is re-trained accordin...
متن کاملA Comparative Study of Unsupervised Grapheme-Phoneme Alignment Methods
This paper describes and compares two unsupervised algorithms to automatically align Japanese grapheme and phoneme strings, identifying segment-level correspondences between them. The first algorithm is inspired by the tf-idf model, including enhancements to handle phonological variation and determine frequency through analysis of “alignment potential”. The second algorithm relies on the C4.5 c...
متن کاملA Language - Independent , Data - OrientedArchitecture for Grapheme - to
We report on an implemented grapheme-to-phoneme conversion architecture. Given a set of examples (spelling words with their associated phonetic representation) in a language, a grapheme-to-phoneme conversion system is automatically produced for that language which takes as its input the spelling of words, and produces as its output the phonetic transcription according to the rules implicit in t...
متن کاملPermA and Balloon: Tools for string alignment and text processing
Two online research tools are presented in this paper: PermA, a general-purpose string aligner which can for example be used for grapheme-to-phoneme and phonemeto-phoneme alignment, and Balloon, a text processing toolkit for German and English providing components for part-of-speech tagging, morphological analyses, and grapheme-to-phoneme conversion including syllabification and word-stress ass...
متن کامل